Tags: data science*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. CUDA 13.2 brings full support for CUDA Tile on Ampere, Ada, and Blackwell architectures, alongside enhancements to cuTile Python including recursive functions, closures, and custom reductions. Core updates include improved memory transfer APIs, reduced LMEM footprint in Windows, and a shift to MCDM for better compatibility. Math libraries gain experimental Grouped GEMM with MXFP8 and FP64-emulated cuSOLVERD. Developer tools see updates to Nsight Python, Nsight Compute, and Nsight Systems, alongside a modern C++ runtime in CCCL 3.2. CuPy also gains support for CUDA 13 and stream sharing.
  2. The New Stack encourages its readers to contribute to Towards Data Science, a leading platform for data science and AI. Recognizing the increasing convergence of cloud infrastructure, DevOps, and AI engineering, the article invites practitioners to share their experiences with building and deploying AI systems. Successful TDS submissions are technically detailed, timely, and specific. Authors can also benefit from editorial support, promotion, and potential payment opportunities, while building their reputation within the AI community.
  3. This tutorial explores how to use LLM embeddings as features in time series forecasting models. It covers generating embeddings from time series descriptions, preparing data, and evaluating the performance of models with and without LLM embeddings.
  4. This course takes you from Python fundamentals to AI Agent development, covering core Python, NumPy, Pandas, SQL, Flask, FastAPI, LLMs, and open-source models via HuggingFace.
  5. PCA and t-SNE are popular dimensionality reduction techniques used for data visualization. This tutorial compares PCA and t-SNE, highlighting their strengths and weaknesses, and provides guidance on when to use each method.

    This article from Machine Learning Mastery discusses when to use Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) for dimensionality reduction and data visualization. Here's a summary of the key points:

    * **PCA is a linear dimensionality reduction technique.** It aims to find the directions of greatest variance in the data and project the data onto those directions. It's good for preserving global structure but can distort local relationships. It's computationally efficient.
    * **t-SNE is a non-linear dimensionality reduction technique.** It focuses on preserving the local structure of the data, meaning points that are close together in the high-dimensional space will likely be close together in the low-dimensional space. It excels at revealing clusters but can distort global distances and is computationally expensive.
    * **Key Differences:**
    * **Linearity vs. Non-linearity:** PCA is linear, t-SNE is non-linear.
    * **Global vs. Local Structure:** PCA preserves global structure, t-SNE preserves local structure.
    * **Computational Cost:** PCA is faster, t-SNE is slower.
    * **When to use which:**
    * **PCA:** Use when you need to reduce dimensionality for speed or memory efficiency, and preserving global structure is important. Good for data preprocessing before machine learning algorithms.
    * **t-SNE:** Use when you want to visualize high-dimensional data and reveal clusters, and you're less concerned about preserving global distances. Excellent for exploratory data analysis.
    * **Important Considerations for t-SNE:**
    * **Perplexity:** A key parameter that controls the balance between local and global aspects of the embedding. Experiment with different values.
    * **Randomness:** t-SNE is a stochastic algorithm, so results can vary. Run it multiple times to ensure consistency.
    * **Interpretation:** Distances in the t-SNE plot should not be interpreted as true distances in the original high-dimensional space.



    In essence, the article advises choosing PCA for preserving overall data structure and speed, and t-SNE for revealing clusters and local relationships, understanding its limitations regarding global distance interpretation.
  6. A gentle introduction to Causal Machine Learning, covering the core concepts, differences from traditional ML, and practical applications with Python.
  7. A guide to essential data visualization techniques for data scientists, covering plots like scatter plots, line plots, histograms, box plots, heatmaps, and more, with explanations of when and how to use them effectively.
  8. Strong statistical understanding is crucial for data scientists to interpret results accurately, avoid misleading conclusions, and make informed decisions. It's a foundational skill that complements technical programming abilities.

    * **Statistical vs. Practical Significance:** Don't automatically act on statistically significant results. Consider if the effect size is meaningful in a real-world context and impacts business goals.
    * **Sampling Bias:** Be aware that your dataset is rarely a perfect representation of the population. Identify potential biases in data collection that could skew results.
    * **Confidence Intervals:** Report ranges (confidence intervals) alongside point estimates to communicate the uncertainty of your data. Larger intervals indicate a need for more data.
    * **Interpreting P-Values:** A p-value indicates the probability of observing your results *if* the null hypothesis is true, *not* the probability the hypothesis is true. Always report alongside effect sizes.
    * **Type I & Type II Errors:** Understand the risks of false positives (Type I) and false negatives (Type II) in statistical testing. Sample size impacts the likelihood of Type II errors.
    * **Correlation vs. Causation:** Correlation does not equal causation. Identify potential confounding variables that might explain observed relationships. Randomized experiments (A/B tests) are best for establishing causation.
    * **Curse of Dimensionality:** Adding more features doesn't always improve model performance. High dimensionality can lead to data sparsity, overfitting, and reduced model accuracy. Feature selection and dimensionality reduction techniques are important.
  9. This article covers five Python scripts designed to automate impactful feature engineering tasks, including encoding categorical features, transforming numerical features, generating interactions, extracting datetime features, and selecting features automatically.
  10. This article details seven pre-built n8n workflows designed to streamline common data science tasks, including data extraction, cleaning, model training, and deployment.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "data science"

About - Propulsed by SemanticScuttle